Refactoring tutorial #440

lidiazuin · 2025-01-17T15:21:39Z

No description provided.

neo-technology-commit-status-publisher · 2025-01-23T14:34:40Z

This PR includes documentation updates
View the updated docs at https://neo4j-docs-getting-started-440.surge.sh

New pages:

Tutorial: Refactor a graph data model

nmervaillie

Very valuable addition to the docs.
Just a few comments

nmervaillie · 2025-01-24T16:03:01Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+
+The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship.
+
+== Eliminating duplicated data


technically we're not deleting data in this section
rename to "reshaping the data"?

nmervaillie · 2025-01-24T16:03:30Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+* *In order to perform the query, all `Movie` nodes must be retrieved* -> As the graph scales, the performance of a similar query can be dimished by the way you modeled your data.
+* *The name of the language is duplicated in many `Movie` nodes (in this case, all of them)* -> If many nodes share a same property value, it could be a sign that this property value could instead become a new entity, like a node or a relationship, for example.
+
+The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship.


what about a visual to represent the data model before/after?

nmervaillie · 2025-01-24T16:14:54Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+[source,cypher]
+--
+MATCH (m:Movie)
+MATCH (l:Language)


this does not work, there is a where clause missing to take only the language nodes matching the movie languages

this will work only in small graphs, mid-size graphs and above require sub-transactions

a unique constraint is missing on the language identifier

In practice we would do all of this in a single query for efficiency

CREATE CONSTRAINT unique_language FOR (n:Language) REQUIRE n.name IS UNIQUE

Note: untested query

:auto <1> MATCH (m:Movie) CALL (m) { UNWIND m.languages AS language MERGE (l:Language {name:language}) MERGE (m)-[:IN_LANGUAGE]->(l) SET m.languages = null } IN TRANSACTIONS OF 10000 rows

<1> required in neo4j browser to run nested transactions

nmervaillie · 2025-01-24T16:19:43Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+One way to improve your current model is to check for duplicate key values and see if you can turn them into another entity, like a node or a relationship.
+In this case, both production companies are based in California, so the state could be turned into a node for `State` and be connected to the producer companies via a new relationship `LOCATED_AT`:
+
+image::california.svg[The producer company nodes now have one less property for state and connect to a state node for California, role=popup]


For data consistency, the country property should also move to the State nodes

nmervaillie · 2025-01-24T17:18:22Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+RETURN m
+--
+
+| How many users rated a movie?


A more 'graphy' alternative could be

MATCH (m:Movie) WHERE m.title = 'Apollo 13' RETURN COUNT {(:User)-[:RATED]->(m)} AS `Number of reviewers`

nmervaillie · 2025-01-24T17:23:58Z

modules/ROOT/pages/data-modeling/tutorial-refactoring.adoc

+
+This should be the result:
+
+image::query-plan.png[Screenshot of Browser featuring a query plan that shows the number of database hits when you retrieve all person nodes,400,400,role=popup]


the data on the screenshot looks suspicious
I would not expect to have 0 rows in the execution pipeline, unless the DB is empty

lidiazuin added 2 commits January 17, 2025 16:05

Refactoring tutorial

3ae4764

Refactoring tutorial

8249c51

lidiazuin requested a review from AlexicaWright January 17, 2025 15:21

lidiazuin added 2 commits January 21, 2025 15:57

content nav

668db2c

undoing changes

1629482

nmervaillie suggested changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring tutorial #440

Refactoring tutorial #440

lidiazuin commented Jan 17, 2025

neo-technology-commit-status-publisher commented Jan 23, 2025

nmervaillie left a comment

nmervaillie Jan 24, 2025

nmervaillie Jan 24, 2025

nmervaillie Jan 24, 2025

nmervaillie Jan 24, 2025

nmervaillie Jan 24, 2025

nmervaillie Jan 24, 2025


		The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship.

		== Eliminating duplicated data


		This should be the result:

		image::query-plan.png[Screenshot of Browser featuring a query plan that shows the number of database hits when you retrieve all person nodes,400,400,role=popup]

Refactoring tutorial #440

Are you sure you want to change the base?

Refactoring tutorial #440

Conversation

lidiazuin commented Jan 17, 2025

neo-technology-commit-status-publisher commented Jan 23, 2025

nmervaillie left a comment

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment

nmervaillie Jan 24, 2025

Choose a reason for hiding this comment